Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
نویسندگان
چکیده
Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses.
منابع مشابه
An Approach to Reducing Overfitting in FCM with Evolutionary Optimization
Fuzzy clustering methods are conveniently employed in constructing a fuzzy model of a system, but they need to tune some parameters. In this research, FCM is chosen for fuzzy clustering. Parameters such as the number of clusters and the value of fuzzifier significantly influence the extent of generalization of the fuzzy model. These two parameters require tuning to reduce the overfitting in the...
متن کاملFrequency Estimation of Unbalanced Three-Phase Power System using a New LMS Algorithm
This paper presents a simple and easy implementable Least Mean Square (LMS) type approach for frequency estimation of three phase power system in an unbalanced condition. The proposed LMS type algorithm is based on a second order recursion for the complex voltage derived from Clarke's transformation which is proved in the paper. The proposed algorithm is real adaptive filter with real parameter...
متن کاملA Novel Algorithm for Rotor Speed Estimation of DFIGs Using Machine Active Power based MRAS Observer
This paper presents a new algorithm based on Model Reference Adaptive System (MRAS) and its stability analysis for sensorless control of Doubly-Fed Induction Generators (DFIGs). The reference and adjustable models of the suggested observer are based on the active power of the machine. A hysteresis block is used in the structure of the adaptation mechanism, and the stability analysis is performe...
متن کاملPresented a method for estimating the cost of software using PCA to reduce the size and with the help of data mining
These days, data mining one of the most significant issues. One field data mining is a mixture of computer science and statistics which is considerably limited due to increase in digital data and growth of computational power of computer. One of the domains of data mining is the software cost estimation category. In this article, classifying techniques of learning algorithm of machine ...
متن کاملHarmonics Estimation in Power Systems using a Fast Hybrid Algorithm
In this paper a novel hybrid algorithm for harmonics estimation in power systems is proposed. The estimation of the harmonic components is a nonlinear problem due to the nonlinearity of phase of sinusoids in distorted waveforms. Most researchers implemented nonlinear methods to extract the harmonic parameters. However, nonlinear methods for amplitude estimation increase time of convergence. Hen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 6 شماره
صفحات -
تاریخ انتشار 2011